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Abstract 

The  ability  to  trace  multicast  paths  is  currently  available  in 
the  Internet  by  means  of  IGMP  MTRACE  packets.  We  in¬ 
troduce  Tracer,  the  first  protocol  that  organizes  the  receivers 
of  a  multicast  group  deterministically  into  a  logical  tree 
structure  while  maintaining  exact  packet-loss  correlation  for 
local  error  recovery,  and  without  requiring  any  changes  to 
existing  multicast  routing  protocols.  Tracer  uses  MTRACE 
packets  in  IGMP  to  allow  a  receiver  host  to  obtain  its  path 
to  the  source  of  a  multicast  group.  Receivers  use  the  mul¬ 
ticast  path  information  to  determine  how  to  achieve  local 
error  recovery  and  effective  congestion  control.  We  compare 
the  tracing  approach  with  prior  mechanisms  that  attempt 
local  recovery.  Results  of  measurements  carried  out  over  the 
CAIRN  illustrate  the  fact  that  tracing  multicast  paths  is  an 
effective  tool  to  organize  receivers  based  on  their  packet-loss 
correlation. 

1  Introduction 

As  support  in  the  Internet  for  multicast,  or  one-to-many, 
communication  between  a  group  of  hosts  continues  to  grow, 
applications  taking  advantage  of  the  savings  of  multicast 
transmissions  increase  in  number.  Such  applications  include 
server-initiated  “push”  technology,  shared  whiteboards  and 
multimedia  conferencing  tools,  and  distributed  interactive 
simulation  environments.  While  many  unicast-based  net¬ 
worked  applications  have  taken  reliable  data  transmission 
for  granted,  reliable  transmission  of  multicast  data  is  still 
an  unresolved  issue. 

What  prevents  the  wide-scale  deployment  of  reliable  mul¬ 
ticast  protocols  is  the  acknowledgment-implosion  problem: 
as  the  number  of  receivers  expecting  reliable  transmission 
of  data  multicast  from  a  source  increases,  the  source  and 
the  network  become  overwhelmed  when  receivers  directly 
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contact  the  source  with  either  negative  acknowledgments 
(Nacks)  requesting  lost  data  and  the  retransmissions  that 
ensue  during  packet-loss,  or  positive  acknowledgments  ( Acks) 
of  correctly  received  data.  Producing  a  reliable  multicast 
protocol  that  scales  well  with  the  number  of  receivers,  in 
terms  of  network  traffic  and  the  processing  required  of  the 
source  and  receivers,  has  proven  to  be  a  challenge,  as  demon¬ 
strated  by  the  number  of  approaches  taken  in  the  past  (e.g., 
[17,  5,  23,  9,  11,  7,  15]).  Moreover,  as  protocols  for  multi¬ 
cast  error  control  are  developed,  mechanisms  must  also  be 
developed  for  multicast  congestion  control  [6,  13,  3],  simi¬ 
lar  to  those  developed  for  such  unicast  reliable  protocols  as 
TCP  [8]. 

One  technique  that  has  been  used  in  the  past  by  reli¬ 
able  multicast  protocols  is  local  recovery,  designating  one 
or  more  hosts  other  than  the  source  (usually  one  of  the  re¬ 
ceivers)  as  another  source  of  retransmissions  when  the  net¬ 
work  resources  permit  it.  For  instance,  the  Scalable  Re¬ 
liable  Multicast  (SRM)  [5]  protocol  allows  any  receiver  to 
respond  to  a  Naek,  and  the  Reliable  Multicast  Transport 
Protocol  (RMTP)  [17]  appoints  “designated  receivers”,  or¬ 
ganized  into  a  tree  structure,  for  sending  retransmissions  to 
portions  of  the  receiver  set.  Organizing  the  receivers  into  a 
tree  structure,  where  each  receiver  is  responsible  for  a  set 
number  of  other  receivers,  has  been  shown  via  an  analysis 
model  to  be  the  most  scalable  choice  among  several  meth¬ 
ods  [19,  10].  Intuitively,  reliable  multicast  protocols  that 
organize  trees  of  receivers  for  local  recovery  work  well  be¬ 
cause  they  distribute  the  cost  of  processing  Acks,  Nacks, 
and  retransmissions,  which  reduces  the  load  on  the  source 
and  the  network.  The  question  is  how  to  establish  such  an 
organized  local  recovery  hierarchy  efficiently. 

We  present  the  first  method  for  organizing  multicast  re¬ 
ceivers  deterministically  to  ensure  that  Acks  and  Nacks  are 
always  sent  to  the  best  receivers,  in  terms  of  packet-loss 
correlation  or  a  combination  of  performance  considerations. 
Section  2  discusses  past  techniques  used  to  organize  the  re¬ 
ceiver  set  for  local  recovery,  and  considers  the  limitations 
of  using  each  technique  alone  and  in  combination.  Sec¬ 
tion  3  introduces  a  novel  approach  for  organizing  receivers — 
tracing  the  path  from  the  source — heretofore  overlooked  as 
an  immediately  available  tool  built  into  multicast  routers. 
Our  protocol  for  organizing  the  receiver  set,  called  Tracer , 
is  based  on  tracing  as  a  prediction  of  packet-loss  correla¬ 
tion.  We  view  Tracer  as  a  protocol  component  for  use  in 
reliable  multicast  or  multicast  congestion  control  protocols. 
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Section  3  also  discusses  small  local  improvements  to  routers 
that  would  allow  subtree  multicasting,  or  subcasting,  which 
would  greatly  improve  the  efficiency  of  reliable  multicast 
protocols,  and  shows  why  Tracer  is  particularly  poised  to 
take  advantage  of  these  improvements.  Section  4  details 
how  Tracer  can  be  used  as  a  component  of  larger  multicast 
protocols.  In  particular,  we  consider  the  Reliable  Multicast 
Transport  Protocol  (RMTP)  [17],  the  Structure-Oriented 
Resilient  Multicast  Protocol  (STORM)  [22],  and  De  Lucia 
and  Obraczka’s  multicast  congestion  control  scheme  [3]  as 
examples.  Section  5  analyzes  data  collected  on  packet  loss 
between  several  hosts  on  the  Collaborative  Advanced  Inter¬ 
net  Research  Network  (CAIRN),  a  real  network,  in  order 
to  illustrate  how  well  tracing  assists  in  predicting  packet- 
loss  correlation,  and  therefore  receiver  organization.  Finally, 
Section  6  presents  our  conclusions  and  how  to  download  a 
public  domain  implementation  of  Tracer  that  we  have  built. 

2  Motivation 


Much  research  in  multicast  error-control  has  converged  to 
the  notion  of  local  aggregation  of  feedback  and  local  recov¬ 
ery  when  the  network  permits  it,  which  implies  the  definition 
of  local  groups  of  receivers.  Current  research  in  congestion 
control  also  seems  to  require  a  notion  of  groups,  such  that 
all  members  within  a  group  have  the  same  state  of  conges¬ 
tion  or  packet-loss  correlation.  The  dynamic  organization 
of  group  members  is  a  crucial  component  of  both  error  con¬ 
trol  and  congestion  control,  and  several  methods  have  been 
used  in  the  past  for  both  approaches.  These  include  mea¬ 
suring  propagation  delay  or  hop  counts,  exchanging  packet- 
loss  information,  and  providing  router  support  at  multicast 
routers.  In  this  section,  we  consider  each  method  and  their 
limitations  or  why  they  may  not  be  applicable,  even  in  com¬ 
bination. 

For  clarity,  we  refer  to  receivers  in  error-control  protocols 
sending  Acks  or  Nacks  as  requesters  and  receivers  that  may 
respond  with  retransmissions  as  retransmitters.  The  set  of 
retransmitters  does  not  exclude  the  source.  We  may  also 
think  of  a  parent  and  child  relationship  between  retransmit¬ 
ter  and  requester.  The  rest  of  this  section  examines  existing 
approaches  to  grouping  receivers  while  trying  to  preserve 
packet-loss  correlation. 

2.1  Measuring  hop  count 

One  of  the  main  goals  of  selecting  a  retransmitter  for  each 
requester  of  a  reliable  multicast  protocol  is  minimizing  the 
delay  in  receiving  retransmissions.  For  example,  by  measur¬ 
ing  the  number  of  routers  on  the  path  between  two  hosts, 
i.e.,  the  hop  count ,  it  is  thought  that  the  closest  retransmit¬ 
ter  (and  if  possible  a  retransmitter  in  the  same  administra¬ 
tive  domain  as  the  requester)  can  be  located.  For  example, 
the  Tree-based  Multicast  Transport  Protocol  (TMTP)  [23] 
builds  a  tree  of  receivers  by  finding  the  closest  retransmitter 
to  each  receiver  based  on  hop  counts  between  hosts.  Fig¬ 
ures  1  and  2  illustrate  a  multicast  routing  tree  and  three 
problems  that  can  arise  by  using  hop  count  as  a  measure  in 
the  way  it  is  used  in  TMTP.  First,  in  Figure  1,  consider  a 
host  attached  to  router  E  which  must  choose  between  two 
retransmitters  attached  to  routers  C  and  G:  both  are  the 
same  number  of  hops  away.  The  host  attached  to  C  is  the 
better  choice  because  data  received  at  G  must  have  passed 
through  E  already,  and  also  because  C  is  closer  to  the  source 
than  G ,  and  generally  that  means  there  is  less  chance  for 
packet  loss.  Furthermore,  the  host  at  C  will  probably  de¬ 


Figure  2:  Possible  looping  from  using  hop  counts. 


tect  the  loss  before  G  because  it  is  closer  to  the  source  and 
expects  the  packet  sooner;  the  host  at  C  may  then  request 
the  retransmissions  before  the  host  at  G ,  and  be  able  to 
service  the  host  at  E’s  request  sooner.  However,  using  hop 
count  as  a  metric  in  this  manner  cannot  determine  whether 
a  host  at  C  is  a  better  retransmitter  than  a  host  at  G. 

Figure  1  also  illustrates  a  second  problem  that  may  arise 
in  using  hop  counts  to  select  retransmitters.  The  host  at  E 
must  choose  between  two  retransmitters  attached  to  routers 
C  and  D.  Router  D  may  lie  behind  a  300-baud  modem,  but 
such  information  is  not  available  by  hop  counts  (but  may  be 
revealed  by  propagation  delay  or  packet-loss  statistics). 

Finally,  using  hop  count  as  the  only  measure  of  packet- 
loss  correlation  is  subject  to  a  looping  condition,  as  illus¬ 
trated  by  Figure  2.  A  host  attached  to  router  C  may  choose 
a  retransmitter  attached  to  router  D;  the  same  host  at  D 
may  choose  a  retransmitter  attached  to  router  E.  A  loop 
of  retransmitters  forms  if  the  host  at  E  chooses  the  host  at 
C  as  its  retransmitter.  If  the  hop  count  from  the  source 
is  used,  rather  than  between  nodes,  then  the  loop  could  be 
prevented,  for  example,  if  nodes  were  required  to  find  re¬ 
transmitters  that  lie  closer  to  the  source  than  themselves. 

The  Reliable  Multicast  Transport  Protocol  (RMTP)  [17] 
also  builds  a  tree  of  retransmitters.  Each  receiver  selects  the 
closest  retransmitter  based  on  propagation  delay.  However, 
the  tree  of  retransmitters  must  be  picked  by  administra¬ 
tors  to  ensure  that  a  loop-free  hierarchy  of  retransmitters  is 
formed. 

In  summary,  retransmitters  chosen  by  hop  count  alone  do 
not  provide  any  sense  of  relative  positioning  between  a  group 
of  nodes,  do  not  protect  against  looping  if  used  incorrectly, 
and  do  not  provide  any  information  regarding  link  speed. 

2.2  Measuring  propagation  delay 

Another  method  used  to  reduce  the  delay  in  receiving  a  re¬ 
transmission  is  to  send  requests  to  the  closest  node  in  terms 
of  propagation  delay:  the  smaller  the  propagation  delay  to 
the  retransmitter,  the  sooner  the  retransmission  will  arrive. 
There  are  a  number  of  caveats  for  using  propagation  delay, 
which  we  discuss  here. 

First  of  all,  propagation  delays  on  the  Internet  can  be 
very  dynamic  and  should  only  be  used  as  an  estimate,  not 


Figure  3:  Measuring  multicast  distances  incorrectly  with 
unicast  propagation  delay.  Multicast  paths  are  shown  with 
gray  arrows. 

as  an  exact  measure  of  distance  between  nodes.  As  the 
distances  among  the  hosts  decreases,  propagation  delay  be¬ 
comes  an  inaccurate  measure  to  determine  where  hosts  lie 
relative  to  each  other  and  the  source. 

Second,  the  network  resources  used  in  measuring  delay 
must  be  considered.  Measuring  the  delay  between  two  ar¬ 
bitrary  hosts  does  not  cost  much  by  itself;  however,  if  a 
protocol  requires  every  host  to  know  the  propagation  delay 
between  itself  and  the  source,  the  cost  becomes  unscalable. 
For  example,  SRM  requires  that  all  receivers  measure  the 
roundtrip  delay  between  each  other.  Even  if  hierarchical 
structures  are  introduced  in  the  exchange  of  session  mes¬ 
sages  with  delay  information,  the  accuracy  of  such  infor¬ 
mation  degrades  as  the  scalability  of  session  messages  is  in¬ 
creased  by  aggregating  delay  information  [20].  Alternatively, 
the  clocks  of  all  hosts  in  the  session  could  be  synchronized, 
e.g.,  by  using  a  protocol  such  as  the  Network  Time  Protocol 
(NTP)  [12];  however,  propagation  delay  between  hosts  must 
still  be  measured. 

Finally,  not  all  Internet  routers  are  capable  of  routing 
multicast  packets.  This  may  be  the  case  for  a  long  time 
in  the  future,  because  even  as  the  multicast  backbone  in¬ 
creases,  many  network  domain  administrators  purposely  for¬ 
bid  multicast  traffic  through  certain  routers.  The  unicast 
path  between  two  hosts  is  not  necessarily  the  same  as  the 
multicast  path  between  the  same  hosts.  Therefore,  measur¬ 
ing  propagation  delay  between  two  hosts  via  unicast  paths 
does  not  accurately  measure  the  delay  between  the  two  on 
a  multicast  tree.  If  a  protocol  depends  on  delay  to  give 
relative  positions  on  the  multicast  tree,  then  the  protocol 
must  measure  multicast  propagation  delay  in  order  to  find 
the  closest  server.  In  other  words,  if  unicast  propagation 
delay  is  the  measure,  unicast  retransmissions  must  be  sent 
(as  in  STORM  [22]).  Similarly,  if  multicast  retransmissions 
are  to  be  sent  from  the  closest  server,  then  the  multicast 
propagation  delays  must  be  measured  (as  in  SRM  [5]). 

If  a  unicast-measured  retransmitter  uses  multicasts  to 
send  the  retransmissions,  then  the  retransmitter  might  not 
be  the  closest  host  on  the  multicast  tree,  or  even  one  that 
lies  between  the  requester  and  the  source  (in  other  words, 
even  the  source  may  have  been  a  better  choice).  Figure  3 
illustrates  that  using  a  roundtrip  unicast  propagation  delay 
can  lead  to  choosing  by  mistake  the  farthest  host  on  the 
multicast  tree  as  the  closest.  Notice  that  the  unicast  path 
from  the  router  attached  to  the  source  to  host  at  router 
E  (through  router  Z)  is  shorter  than  the  unicast  path  from 
host  attached  to  the  router  at  C  to  router  E  (through  routers 
X  and  1').  However,  retransmissions  multicast  from  C  will 
reach  E  before  those  multicast  from  the  source’s  router  do, 
and  unicast  measurements  will  not  detect  this. 


2.2.1  Multicasting  or  unicasting  retransmissions 

Regardless  of  what  metric  is  used  to  choose  the  retransmit¬ 
ter,  deciding  between  unicast  retransmissions  and  multicast 
transmissions  introduces  its  own  set  of  problems.  Unicast 
retransmissions  trickle  down  the  tree  from  retransmitter  to 
retransmitter  (e.g.,  the  STORM  protocol  unicasts  retrans¬ 
missions  despite  its  emphasis  on  deadline-oriented  data); 
multicast  retransmissions  have  been  shown  to  lower  aver¬ 
age  packet  delay  from  reliable  transmission  [18].  Further¬ 
more,  unicast  retransmissions  are  a  waste  of  bandwidth; 
sending  multiple  copies  of  the  same  data  is  inefficient,  es¬ 
pecially  since  a  multicast  routing  tree  already  connects  the 
receivers.  To  make  up  for  the  delay  from  unicast  retrans¬ 
mission,  STORM  allows  nodes  to  change  parents  if  the  delay 
in  receiving  the  retransmissions  is  too  long,  even  though  a 
parent  may  be  waiting  for  the  retransmission  from  a  grand¬ 
parent.  As  congestion  and  packet  loss  increase,  the  orga¬ 
nization  of  receivers  may  degrade  into  a  scenario  in  which 
every  host  sends  Nacks  directly  to  the  source. 

On  the  other  hand,  multicasting  retransmissions  can  also 
waste  bandwidth  and  processing  resources  if  the  packets 
reach  receivers  that  do  not  need  the  retransmitted  data. 
Therefore,  multicast  retransmissions  may  save  resources,  but 
only  if  the  receivers  that  require  the  data  are  the  only  re¬ 
cipients.  Unfortunately,  restricting  the  scope  of  a  multi¬ 
cast  packet  using  existing  multicast  technology  can  only  be 
achieved  by  either  forming  and  maintaining  a  new  multicast 
group  (which  is  costly),  or  by  reducing  the  number  of  routers 
a  packet  may  traverse  by  setting  the  time-to-live  (TTL)  field 
present  in  all  IP  packets  to  an  explicit  limit.  Reducing  the 
TTL  of  a  multicast  packet  is  crude  as  it  lacks  direction,  i.e. , 
packets  are  still  disseminated  in  all  directions  from  the  re¬ 
transmitter. 

Another  issue  with  multicasting  retransmissions  is  that 
only  one  retransmitter  should  multicast  data  to  the  same  set 
of  receivers.  However,  this  is  not  the  case  in  SRM,  which 
allows  any  receiver  to  answer  a  request  after  waiting  for  a 
backoff  interval  to  check  if  other  hosts  have  already  acted. 
The  advantage  of  multicasting  retransmissions,  as  in  SRM,  is 
that  some  hosts  may  get  retransmissions  before  they  request 
them.  However,  because  SRM  uses  a  probabilistic  approach, 
multiple  retransmitters  may  act  on  a  request. 

In  summary,  multicasting  retransmissions  can  be  more 
efficient  than  unicasting  retransmissions,  provided  that  only 
one  retransmitter  responds,  and  that  the  scope  of  the  re¬ 
transmissions  is  only  to  receivers  that  require  the  data.  This 
calls  for  a  deterministic  approach  to  multicast  retransmis¬ 
sions,  as  well  as  the  ability  to  accomplish  subcasting,  i.e., 
multicasting  to  a  specific  subset  of  a  multicast  group. 

2.3  Using  router  support 

Both  the  Reliable  Multicast  Architecture  (RMA)  [9]  and 
the  Error  Control  Scheme  for  Large-Scale  Multicast  Appli¬ 
cations  (ECSLMA)  [16]  organize  the  receiver  set  of  a  mul¬ 
ticast  session  into  a  tree  by  adding  extra  functionality  to 
multicast  routers.  Rather  than  storing  data  at  the  routers  of 
a  multicast  tree,  the  two  protocols  actually  provide  special 
routing  support  that  can  be  exploited  for  the  distribution 
of  acknowledgments  and  retransmissions  needed  in  reliable 
multicast  protocols. 

RMA  and  ECSLMA  have  two  advantages  over  approaches 
that  use  end-to-end  techniques:  the  receivers’  Nacks  and 
Acks  are  automatically  routed  to  a  nearby  receiver  in  a  loop- 
free  tree  structure,  and  retransmissions  can  be  sent  to  a 
subtree  of  the  primary  multicast  tree,  which  is  quicker  and 


Figure  4:  For  a  host  attached  at  router  E,  host  attached  to 
router  D  are  directly  acceptable;  hosts  at  router  C  are  indi¬ 
rectly  acceptable;  and  hosts  at  router  X  are  unacceptable. 

more  efficient.  RMA  and  ECSLMA  do  not  require  expand¬ 
ing  ring  searches  to  coordinate  the  receivers  of  the  session, 
which  virtually  all  reliable  multicast  protocols  to  date  use. 

While  the  approaches  taken  by  RMA  and  ECSLMA  are 
very  similar,  the  differences  lie  in  their  operation.  We  detail 
the  mechanisms  used  in  RMA  briefly  because  Tracer  specif¬ 
ically  adapts  the  mechanisms  used  in  RMA.  The  differences 
in  operation  between  RMA  and  ECSLMA  mainly  concern 
issues  that  Tracer  solves  in  other  ways,  and  are  not  of  con¬ 
cern  here;  for  example,  working  over  shared-tree  multicast 
protocols  (such  as  CBT  and  OCBT  [1,  21]),  and  avoiding 
sending  redundant  retransmissions  to  some  receivers. 

RMA  is  built  on  the  concept  we  call  acceptability,  which 
considers  whether  a  retransmitter  is  a  good  choice  for  a  re¬ 
quester  based  on  what  percentage  of  the  path  they  share 
from  the  source.  If  two  hosts  share  a  common  path  on  the 
routing  tree,  then  there  exists  a  packet-loss  correlation  be¬ 
tween  the  two  hosts  for  any  packets  lost  on  the  common 
path  (see  Section  5).  More  formally,  the  relationship  be¬ 
tween  a  retransmitter  and  a  requester  is  either  unacceptable 
(they  share  no  common  path  from  the  source) ,  or  one  of  the 
following  [9]: 

Directly  acceptable:  The  relationship  between  a  receiver 
A  and  its  retransmitter  B  is  directly  acceptable  if  the 
router  attached  to  B  lies  on  the  path  from  the  source 
to  the  router  attached  to  A. 

Indirectly  acceptable:  The  relationship  between  a  receiver 
A  and  its  retransmitter  B  is  indirectly  acceptable  if 
the  routers  attached  to  B  and  A  share  a  common  path 
from  the  source,  and  the  router  attached  to  B  is  closer 
to  the  source  than  the  router  attached  to  A. 

Figure  4  illustrates  all  three  cases.  For  a  host  attached 
at  router  E,  hosts  attached  to  router  D  are  directly  accept¬ 
able  retransmitters  because  packets  must  travel  through  D 
before  reaching  E.  For  the  same  hosts  attached  to  router 
E,  hosts  at  router  C  are  indirectly  acceptable  retransmitters 
because  a  common  path  (Src  — »  A  — »  B)  is  shared  between 
C  and  E,  and  C  is  closer  to  the  source  than  E.  All  hosts 
attached  to  router  A'  are  unacceptable  retransmitters  since 
packets  from  the  source  to  X  travel  a  completely  different 
path  than  those  traveling  to  E.  (Note  that  additional  fac¬ 
tors  can  influence  the  choice  of  parent  in  RMA,  ECSLMA, 
and  Tracer,  including  perceived  packet  loss.) 

The  advantage  of  organizing  the  receiver  set  into  a  tree 
such  that  every  retransmitter-requester  relationship  is  ac¬ 
ceptable,  as  in  RMA,  is  that  all  receivers  that  are  down¬ 
stream  on  the  retransmitter-requester  tree  are  also  down¬ 
stream  on  the  multicast  routing  tree  [9].  For  example,  in 
Figure  4,  if  C  is  the  retransmitter  for  D,  and  D  is  the  re¬ 
transmitter  for  E,  multicasts  from  C  towards  D  (i.e. ,  trav¬ 
eling  from  the  common  router  B  and  away  from  the  source) 
will  reach  all  of  D' s  children,  and  not  hosts  that  are  not 
D' s  children.  We  refer  to  a  transmission  to  a  portion  of  the 


multicast  tree,  e.g.,  starting  B  and  away  from  the  source  (or 
root)  and  the  incoming  interface,  as  a  subcast. 

While  the  approach  of  requiring  acceptability  is  good, 
RMA  and  ECSLMA  require  several  additions  to  the  pro¬ 
tocols  used  at  multicast  routers.  In  contrast,  Tracer  also 
builds  a  completely  acceptable  tree  of  receivers,  but  with¬ 
out  requiring  any  changes  to  routers.  Just  like  RMA  and 
ECSLMA,  Tracer  is  able  to  identify  the  specific  router  from 
which  subcast  retransmissions  should  be  sent. 

2.4  Exchanging  data  of  observed  performance 

A  novel  approach  to  organizing  the  receiver  set  is  considered 
by  the  STORM  [22]  protocol,  where  receivers  pick  the  best 
retransmitter  based  on  observed  performance,  propagation 
delay,  and  buffer  size.  Each  potential  retransmitter  is  evalu¬ 
ated  based  on  what  percentage  of  the  packets  can  be  received 
at  the  requester  before  the  buffer  runs  out.  Specifically,  each 
potential  retransmitter  is  asked  to  determine  the  percentage 
of  packets  it  has  received  correctly  from  the  source  within 
time  t  +  B  —  d,  where  t  is  the  average  delay  from  the  source 
to  the  requester,  B  is  the  buffer  size  at  the  requester  (in 
time  units),  and  d  is  the  unicast  propagation  delay  between 
requester  and  retransmitter. 

The  STORM  resilient  multicast  paradigm  works  well, 
and  includes  a  labeling  system  to  avoid  looping  of  retrans¬ 
mitters.  However,  the  STORM  receiver  set  organization  is 
not  set  up  properly  to  do  efficient  multicast  retransmissions 
(via  either  subcasts  or  separate  multicast  groups).  Mul¬ 
ticast  retransmissions  do  not  work  with  STORM,  because 
measurements  are  performed  using  unicast  paths,  and  there 
is  nothing  about  measuring  hop  count,  propagation  delay,  or 
packet-loss  correlation  that  deterministically  ensures  an  ac¬ 
ceptable  acknowledgment  tree;  nor  can  any  of  those  heuris¬ 
tics  discover  the  name  of  the  router  to  send  the  subcast 
retransmission  from. 

3  Tracing 

We  propose  a  new  protocol,  called  Tracer,  which  relies  upon 
a  function  built  into  all  IP  multicast  routers  running  the 
Internet  Group  Management  Protocol  (IGMP)  [4].  IGMP 
specifies  a  special  packet  called  “MTRACE”  (multicast  trace) 
that  allows  any  host  to  trace  its  path  to  the  source  (root) 
of  a  multicast  routing  tree  for  a  specified  number  of  hops. 
The  commonly  used  Unix  tool  Mtrace  is  also  based  on  the 
MTRACE  function.1 

Unlike  hop  counts,  propagation  delays,  or  exchanging 
performance  statistics,  tracing  allows  receivers  to  definitively 
discover  if  they  share  a  common  path  from  the  source  by 
comparing  path-strings  returned  by  MTRACE  queries.  Ac¬ 
cordingly,  tracing  can  be  used  by  receivers  to  create  accept¬ 
able  relationships  between  retransmitters  and  requests,  just 
as  in  RMA.  But,  unlike  the  mechanisms  used  in  RMA,  trac¬ 
ing  is  available  without  modification  of  routers. 

Figure  5  illustrates  the  MTRACE  process.  Because  each 
multicast  router  knows  the  interface  that  leads  to  the  source 
of  the  tree,  the  MTRACE  query  traverses  each  router  in 
receiver-to-source  order,  recording  the  path  in  the  data  por¬ 
tion  of  the  packet.  Once  the  source  is  reached,  the  packet 
is  sent  back  to  the  receiver  as  a  standard  unicast  packet. 
Hosts  that  share  a  common  path  will  see  common  network 
performance  characteristics,  as  we  show  in  Section  5. 

1The  source  code  for  Mtrace  can  be  found  at 
ftp : //ftp . isi . edu/mbone/mtrace .tar .Z 


Figure  5:  Routers  D  and  F  send  an  MTRACE  along  the 
reverse  path  to  the  source.  The  source’s  router  returns  the 
recorded  path-string  via  unicast. 

Tracing  provides  requester  and  retransmitter  with  the 
exact  name  of  the  last  router  common  to  both,  which  en¬ 
ables  deterministic  grouping  of  receivers  with  exact  packet- 
loss  correlation,  and  is  precisely  the  missing  information  re¬ 
quired  to  do  subcasting.  Propagation  delay  and  hop  counts 
cannot  provide  this  information,  which  is  possibly  why,  al¬ 
though  sometimes  mentioned,  subcasting  has  not  been  ex¬ 
plored  before  now  as  a  practical  option.  The  introduction  of 
subcasting  to  internet  routers  would  require  minimal  local 
modification — no  additional  routing  protocols  or  signaling 
are  required.  In  fact,  in  order  to  perform  subtree  multicas¬ 
ting,  all  a  router  needs  to  do  is  send  the  packet  out  on  all 
multicast  interfaces  appropriate  for  the  group,  except  the 
interface  leading  to  the  source  and  the  interface  over  which 
the  subcast  packet  arrived  on. 

In  the  absence  of  a  subcasting  interface,  the  groups  of  re¬ 
ceivers  organized  by  Tracer  can  still  form  separate  multicast 
groups. 

Tracer  allows  only  acceptable  relationships  between  re¬ 
transmitters  and  requesters,  and  because  the  last  common 
router  between  them  is  known,  Tracer  is  able  to  do  effi¬ 
cient  multicast  retransmissions  once  subcast  transmissions 
become  available  in  the  Internet. 

Tracer  has  four  stages  during  which  a  tree  of  retransmit¬ 
ters  is  constructed  and  maintained  by  the  host  participating 
in  a  multicast  group:  path  discovery,  path  advertisement, 
parent  selection,  and  maintenance. 

The  goal  of  the  tree  building  algorithm  is  for  each  re¬ 
ceiver  to  pick  a  retransmitter  that  shares  a  common  path 
to  the  source,  and  is  closer  to  the  source.  The  choice  of  re¬ 
ceivers  is  augmented  by  such  performance  measures  as  prop¬ 
agation  delay  on  the  multicast  routing  tree  and  packet  loss. 

3.1  Path  Discovery  and  Advertisement 

Path  discovery  begins  at  each  receiver  by  sending  a  unicast 
MTRACE  message  toward  the  source’s  router.  When  the 
unicast  response  to  the  MTRACE  message  returns,  each  re¬ 
ceiver  then  knows  its  path-string  to  the  source.  A  path-string 
is  a  list  of  interfaces  passed  through  by  packets  transmitted 
by  the  source  to  the  receiver  for  a  multicast  group.  Figure  5 
illustrates  this  process. 

Once  the  path-string  is  known,  path  advertisement  be¬ 
gins:  each  receiver  multicasts  a  PATH_ADV  message  adver¬ 
tising  its  path-string,  illustrated  in  Figure  6.  We  refer  to 
such  hosts  as  advertisers.  Path  advertisements  also  include 
the  latest  packet-loss  statistics,  as  well  as  a  timestamp.  Ad¬ 
vertisement  is  performed  as  an  expanding  ring  search  (ERS): 
the  multicast  is  sent  with  a  limited  hop  count  at  first,  and  if 
no  response  is  received,  the  advertisement  is  sent  again  with 
a  larger  hop  count.  The  maximum  value  of  the  ERS  should 
not  be  larger  than  the  hop  count  to  the  source  (the  explicit 


Figure  6:  Path  advertisement  by  the  host  attached  to  router 
F  via  ERS.  The  source,  and  hosts  attached  to  routers  C  and 
D  answer  in  this  example. 

hop  count  can  be  derived  from  the  path-string),  but  because 
of  the  path  asymmetries  between  receivers  communicating 
via  source-based  multicast  routing  trees,  this  should  not  be 
a  hard  limit.  If  the  route  taken  by  the  advertiser’s  multi¬ 
cast  is  different  from  the  source’s  multicast  path,  it  is  not 
a  problem  because  only  the  path  strings  (and  packet  loss) 
are  compared,  not  round  trip  times  or  hop  counts  between 
receivers  on  asynchronous  paths. 

Multicasting  the  advertisement,  even  as  an  ERS,  is  inef¬ 
ficient.  However,  without  additional  router  support,  there  is 
no  other  way  for  receivers  to  discover  each  other’s  existence 
in  the  multicast  group.  ERS  by  itself  lacks  the  notion  of 
directionality  and  hence  simply  using  ERS  may  lead  to  the 
choice  of  a  parent  that  is  downstream  from  the  child  (See 
section  2.1).  Tracer,  on  the  other  hand,  combines  ERS  with 
path-strings  to  obtain  a  precise  idea  of  direction.  Thus, 
Tracer  will  never  choose  a  parent  that  is  farther  from  the 
source  than  the  child  is  from  the  source. 

The  fundamental  difference  in  the  way  Tracer  uses  ERS 
with  respect  to  other  protocols  is  that  Tracer  uses  ERS  to 
advertise  its  selection  criteria,  while  previous  protocols  use 
ERS  as  an  integral  part  of  the  selection  criteria  (e.g.,  [5,  23]). 

Upon  reception  of  another  host’s  PATH_ADV  message, 
a  receiver  must  decide  whether  it  can  respond  as  a  will¬ 
ing  parent,  based  on  the  acceptability  definition.  A  re¬ 
ceiver  should  only  respond  to  it  if  it  can  serve  as  an  ac¬ 
ceptable  parent,  i.e.,  if  the  responder  is  closer  to  the  source 
(in  terms  of  hop  counts)  and  a  common  path  exists.  This 
can  be  easily  decided  by  comparing  the  path  strings.  Hosts 
that  do  not  wish  to  be  retransmitters  simply  do  not  re¬ 
spond  to  PATH_ADV  messages.  Willing  parents  send  a 
PATH_RESPONSE  message  unicast  to  the  advertiser,  which 
includes  the  parent’s  path  string,  latest  packet-loss  statis¬ 
tics,  and  the  PATH_ADV’s  timestamp. 

The  timestamp  is  used  to  measure  potential  parents  dur¬ 
ing  failsafe  mode,  described  subsequently,  or  when  deliver¬ 
ing  retransmissions  in  a  timely  manner  is  an  issue  (see  Sec¬ 
tion  4.2). 

Once  one  or  several  PATH_RESPONSEs  are  received, 
an  advertiser  may  begin  parent  selection.  In  fact,  when 
any  PATH_ADV  message  is  received,  and  if  the  advertiser  is 
determined  to  be  an  acceptable  parent,  then  parent  selection 
may  also  begin. 

Lost  PATH_RESPONSEs  or  PATH_ADVs  are  not  a  prob¬ 
lem  as  the  tree  of  receivers  is  repaired  by  the  maintenance 
stage  of  Tracer.  Therefore,  such  messages  do  not  need  to  be 
sent  reliably. 

The  procedure  in  C  code  used  to  compare  two  path- 
strings  appears  in  Figure  7.  The  function  is_acceptable 
returns  0  if  the  paths  only  match  at  the  sender  (and  both 
hosts  are  not  the  sender  itself),  or  if  no  part  of  the  paths 
are  in  common.  Otherwise,  the  number  of  hops  between  the 
parent  and  child  is  returned:  a  positive  value  indicates  that 


int  is_acceptable (u_long  *  pathA,  int  alen,  u_long  * 
pathB,  int  blen,  u_long  sender)  ■{ 
/*  Shorter  list  searches  the  larger  one  */ 
unsigned  long  *shorter,  ^longer; 
int  slen,  lien, i, j ,sh, lg; 

if  (alen  <  blen  )  { 

shorter  =  pathA;  slen=alen-l; 
longer  =  pathB;  llen=blen-l; 
sh=l ; lg=-l ; 

> 

else  ■( 

shorter  =  pathB;  slen=blen-l; 
longer  =  pathA;  llen=alen-l ; 
sh=-l ; lg=l ; 

> 

for  (i=0; i<=slen; i++) { 
for  (j=0; j<=llen; j++){ 
if  (shorter [i]==longer [ j] )  { 
if  (  (shorter [i] ==sender) 

&&  (shorter [0]  !=  sender) 

&&  (longer [0] !=sender) ) 

/*  if  a  match  is  made  only  at  the  sender 
and  neither  node  is  actually  sender  */ 
return(O) ; 

else  •(  /*if  shorter  is  closer  to  common  node*/ 
if  (i  <j) 

return  (sh*(j+i)); 
else 

return  (lg*(j+i)); 

> 

> 

> 

> 

return(O) ;  /*  no  match  found  */ 


Figure  7:  C  code  for  determining  the  acceptability  of  two 
hosts  based  on  path-strings. 


the  host  advertising  pathA  is  an  acceptable  parent  for  the 
host  advertising  pathB;  a  negative  value  indicates  that  the 
host  advertising  pathB  is  an  acceptable  parent  for  the  host 
advertising  pathA. 

3.2  Parent  selection 

Parent  selection  is  based  on  tracing  and  the  packet-loss  rate 
of  a  potential  parent.  Upon  receiving  a  response  from  any 
potential  parent,  advertisers  check  if  the  potential  parent 
is  a  better  choice  than  their  current  retransmitter,  if  any. 
Whichever  potential  parent  is  closer  on  the  multicast  tree  is 
the  best  choice,  and  is  determined  by  comparing  the  path 
strings,  barring  packet  loss,  which  we  discuss  below.  An 
IM_YOUR_CHILD  message  is  sent  to  the  best  parent  con¬ 
firming  the  relationship,  and  the  parent  stores  its  child’s 
path-string  included  in  the  IM_YOUR_CHILD  message  for 
use  later  in  the  protocol.  A  NOT_YOUR_CHILD  message 
sent  unicast  to  the  old  parent  terminates  the  relationship. 

Periodic  IM_YOUR_PARENT  messages  must  be  sent  from 
a  retransmitter  to  all  its  immediate  children  to  ensure  the 
relationship  is  not  broken  by  network  partitions,  or  other 
problems.  If  a  child  does  not  hear  from  its  parent  after 
several  periods,  then  path  advertisement  begins  again.  Ini¬ 
tial  IM_YOUR_CHILD  messages  expect  an  immediate  IM- 
_YOUR_PARENT  message  from  the  new  parent,  and  if  a 
response  is  not  heard  from  the  parent  after  several  retrans¬ 
missions,  the  child  reverts  back  to  path  advertisement. 

Tracing  provides  a  topology-based  solution,  but  because 
topology  alone  does  not  always  yield  the  best  solution,  Tracer 
also  considers  the  latest  packet-loss  measurements  of  a  re¬ 
ceiver.  In  addition,  the  flexibility  of  not  electing  to  be  a 


Figure  8:  Because  router  D  is  behind  a  lossy  link,  hosts  at 
F  choose  retransmitters  at  C. 


parent  is  also  provided  as  it  helps  in  load  balancing.  Tracer 
can  also  be  extended  to  consider  propagation  delay,  which 
is  useful  for  delay-oriented  protocols,  which  we  describe  in 
Section  4.2. 

For  example,  if  the  closest  receiver  is  behind  a  lossy  or 
slow  link  that  is  not  part  of  the  common  path,  a  next-closest 
ancestor  may  be  chosen,  as  illustrated  in  Figure  8.  A  host  at 
F  may  choose  between  retransmitters  at  D  or  C\  notice  that 
D  is  a  closer  router,  but  it  is  behind  a  lossy  link  and  the  host 
at  C  is  a  better  choice.  Given  that  both  hosts  are  acceptable 
and  that  the  host  at  D  has  a  higher  packet-loss  rate  than 
the  host  at  C,  the  host  at  F  may  choose  the  host  at  C  as  its 
parent.  In  general,  if  an  acceptable  parent  experiences  more 
losses  than  its  child,  the  parent  is  probably  behind  a  lossy 
link  that  is  not  a  part  of  the  path  that  the  parent  and  child 
share  from  the  source.  In  the  worst  case,  every  receiver  will 
wish  to  have  the  source  be  its  retransmitter,  or  one  retrans¬ 
mitter  may  become  overwhelmed  with  too  many  children. 
To  prevent  this  scenario,  overloaded  retransmitters  may  re¬ 
ject  new  children  who  have  closer  choices.  Because  the  path 
strings  of  all  children  are  known  to  a  parent,  when  the  load 
becomes  too  high,  grandchildren  can  be  kicked  out  of  the 
group  first  with  a  NOT_YOUR_PARENT  message.  For  ex¬ 
ample,  the  host  at  C  may  reject  the  host  at  F's  request,  and 
force  it  to  reply  to  the  host  at  D. 

3.3  Maintenance 

The  maintenance  portion  of  the  protocol  must  consider  four 
types  of  changes  to  the  multicast  routing  tree  topology.  The 
path  of  routers  to  the  source  can  be  altered,  a  retransmitter 
can  leave  the  session,  a  new  host  can  join  the  session,  or  a 
parent’s  packet-loss  rate  may  change. 

To  check  whether  the  path  to  the  source  on  the  routing 
tree  has  changed,  each  receiver  runs  a  trace  periodically. 
However,  a  full  trace  is  unnecessary;  only  a  trace  to  the  last 
router  common  to  itself  and  its  parent’s  path  to  the  source 
is  required.  The  time-to-live  (TTL)  field  of  the  MTRACE 
query  can  be  restricted  by  the  known  hop  count  to  the  last 
common  router.  Any  changes  beyond  that  point  will  not 
change  the  status  of  acceptability. 

Tracer  will  adjust  the  tree  of  retransmitters  with  any 
changes  to  the  multicast  routing  tree,  which  may  induce  a  lot 
of  traffic  if  the  routing  tree  changes  frequently.  We  believe 
this  is  the  right  approach  because  Tracer  was  designed  to 
always  pick  what  it  sees  as  the  best  retransmitter  for  each 
requester;  as  the  routing  tree  changes,  some  retransmitters 
are  no  longer  the  best  choices. 

If  the  MTRACE  response  does  show  that  there  has  been 
a  change  in  the  path,  the  receiver  must  discover  if  its  parent 
is  still  the  closest  acceptable  node.  The  ERS-based  path 
advertisement  state  is  entered  again  with  the  distance  to 


Figure  9:  A  host  at  router  C  joins  the  session. 


eration,  no  requester  can  join  a  retransmitter  with  a  lower- 
level  designation  (ties  are  broken  by  sorting  IP  addresses). 
Once  a  join  is  made,  then  a  child  takes  on  its  parent’s  level 
number  plus  one.  This  is  the  only  way  a  child  may  change 
its  level.  Increases  in  level  are  forwarded  down  the  tree  by 
a  parent  to  its  children.  It  is  straight  forward  to  show  that 
loop  freedom  is  achieved,  because  the  level  numbers  render  a 
complete  ordering  along  the  multicast  tree  from  the  source. 


the  parent  as  the  starting  TTL. 

In  the  second  case,  where  a  retransmitter  leaves  the  ses¬ 
sion,  the  orphaned  children  start  the  protocol  over,  by  send¬ 
ing  PATFLADVs  via  an  ERS. 

In  the  third  case,  a  new  receiver  may  join  a  session  be¬ 
tween  an  existing  parent  and  child  pair.  Figure  9  illustrates 
this  case,  where  a  host  at  router  C  has  joined  a  session  and 
the  host  at  router  A  is  already  the  parent  of  the  host  at 
router  F.  After  the  host  at  C  joins,  it  learns  through  an  ad¬ 
vertisement  stage  that  the  host  at  A  is  its  best  parent.  Note 
the  difficulty  that  may  arise:  the  host  at  C  is  now  the  best 
parent  for  the  host  at  F,  but  there  is  no  way  for  the  host 
at  F  or  the  host  at  C  to  know  this.  The  MTRACE  query 
from  F  to  the  source  will  not  return  a  different  path  string, 
and  if  the  PATH_ADVs  from  the  host  at  C  do  not  reach  the 
host  at  F  (since  the  TTL  needed  to  reach  A  from  C  during 
ERS  is  only  2),  then  neither  C  nor  F  will  act  on  forming 
the  relationship.  To  prevent  this  scenario,  whenever  a  par¬ 
ent  takes  on  a  new  child,  it  sends  a  copy  of  the  new  child’s 
PATFLADV  message  to  all  its  children  (via  multicast,  if  a 
separate  group  exists).  In  the  case  of  Figure  9,  the  host  at 
A  would  send  a  copy  of  the  PATH_ADV  from  the  host  at  C 
to  the  host  at  F,  triggering  a  parent  selection  process  at  the 
host  at  F.  (This  also  repairs  the  situation  where  a  host’s 
PATFLADV  was  lost  before  reaching  a  potential  child.) 

In  the  fourth  case,  the  parent’s  packet-loss  rate  may 
reach  an  unacceptable  level.  Parents  advertise  such  informa¬ 
tion  in  the  periodic  IM_YOUR_PARENT  messages  sent  to 
all  children.  If  the  parent’s  packet  loss  rate  is  much  greater 
than  the  child’s,  the  child  may  elect  to  enter  the  adver¬ 
tisement  phase  again  to  find  a  better  parent.  Note  that  the 
comparison  with  the  child’s  rate  is  important:  if  both  parent 
and  child  both  have  increased  rates,  then  probably  the  loss 
is  on  a  common  path,  and  the  parent  may  be  already  look¬ 
ing  for  a  new  parent.  Upon  confirmation  of  a  new  parent, 
a  NOT_YOUR_CHILD  message  terminates  the  relationship 
with  the  old  parent. 

3.4  Failsafe  operation 

Because  Tracer  relies  on  the  multicast  path  to  remain  sta¬ 
ble,  and  assumes  the  correct  operation  of  the  MTRACE 
mechanism,  an  extra  safeguard  prevents  looping  in  the  tree 
if  either  of  the  two  assumptions  is  violated.  Each  receiver 
stores  a  level  number,  with  the  source  initialized  as  zero  and 
receivers  without  parents  initialized  to  infinity.  The  level  of 
each  retransmitter  increases  by  one  as  the  tree  grows  from 
the  source.  No  receiver  can  advertise  its  path,  or  accept 
children  in  the  tree  until  the  multicast  trace  has  been  con¬ 
firmed  by  a  second  trace  at  some  subsequent  time  (e.g.,  less 
than  a  minute).  If  the  paths  remain  unstable,  or  there  is 
no  response  to  an  MTRACE  query  for  a  number  of  retries, 
receivers  resort  to  a  failsafe  mode,  and  look  for  retransmit¬ 
ters  that  are  closest,  by  measuring  the  sum  of  the  multicast 
propagation  delay  from  the  parent  to  the  child  and  the  uni¬ 
cast  propagation  delay  from  the  child  to  the  parent  (just  as 
STORM-Tracer  does  in  Section  4.2).  To  ensure  loop-free  op- 


3.5  Improving  the  Efficiency  of  Tracing 

The  MTRACE  portion  of  IGMP  is  currently  under  revision 
as  a  separate  entity  [2],  focusing  on  using  MTRACE  as  a 
diagnostic  tool  for  individual  receivers  of  a  multicast  group. 
Accordingly,  the  revision  aims  to  avoid  the  scenario  where 
the  source  (and  root)  of  a  routing  tree  initiates  a  trace  that 
is  multicast  to  all  receivers.  Conversely,  because  Tracer  re¬ 
quires  all  receivers  to  trace  their  path,  it  would  be  more  effi¬ 
cient  to  initiate  traces  from  the  source  that  are  multicast  to 
the  receivers.  As  the  source’s  packet  traverses  the  multicast 
tree,  the  path  would  be  recorded  in  the  data  portion  of  the 
packet.  Rather  than  each  receiver  periodically  initiating  an 
MTRACE  query,  Tracer  would  instead  require  each  source 
to  periodically  trace  the  paths  to  the  receiver  set.  Then,  if 
a  receiver  detects  a  change  in  the  path  past  the  last  router 
common  to  its  parent,  it  begins  the  maintenance  portion  of 
the  protocol.  We  are  not  suggesting  that  the  ability  for  a 
single  receiver  to  trace  its  path  from  the  source  be  removed, 
rather  just  that  source-based  multicast  tracing  be  added  to 
IGMP. 

While  shared-tree  multicast  routing  protocols,  like  Or¬ 
dered  Core  Based  Trees  (OCBT)  [21],  are  not  in  wide  use  in 
the  Internet,  we  expect  some  MTRACE  mechanism  will  be 
included  when  they  are  prevalent.  This  section  deals  with 
issues  that  arise. 

The  MTRACE  mechanism  is  built  on  the  assumption 
that  all  routers  know  the  interface  that  leads  to  the  source, 
which  is  also  the  root  of  the  routing  tree.  In  a  shared  tree,  it 
is  not  clear  through  which  interface  an  arbitrary  source  can 
be  reached  and  the  MTRACE  mechanism,  as  it  is  defined 
now,  will  not  work.  There  are  two  solutions  to  this  prob¬ 
lem.  The  first  is  to  simply  perform  multicast  traces  that 
are  initiated  from  each  source,  as  proposed.  The  second  is 
for  each  source  to  trace  the  path  to  the  root  (i.e.,  “core” 
or  “rendez-vous  point”)  of  the  shared  tree.  Each  receiver 
can  then  trace  its  path  to  any  source  by  tracing  its  path 
to  the  root,  and  finding  the  common  router  for  a  particular 
source.  The  two  routes  are  spliced  at  the  common  router 
to  find  the  path  between  source  and  receiver.  The  first  so¬ 
lution  is  obviously  more  efficient  for  multiple  receivers  and 
sources,  but  the  second  solution  is  useful  as  a  diagnostic  tool 
for  individual  receiver-source  pairs. 

4  Using  Tracer  As  a  Component  of  Larger  Protocols 

The  set  of  proposed  reliable  multicast  protocols  to  date  can 
be  categorized  based  to  how  each  protocol  handles  error  con¬ 
trol  and  congestion  control.  Error-control  protocols  can  be 
further  divided  by  considering  whether  they  provide  com¬ 
plete  recovery  from  packet  loss  at  the  expense  of  delay,  or 
whether  they  provide  low-delay,  deadline-oriented  delivery 
at  the  expense  of  complete  recovery  from  packet  loss.  Both 
classes  of  error-control  protocols  can  benefit  from  forward 
error  correction  (FEC)  schemes,  and  many  such  schemes 
have  been  proposed  recently  (e.g. ,[14,  15]) 


Tracer’s  organization  of  the  receiver  set  is  topology-based, 
and  therefore  can  be  considered  as  a  replacement  component 
for  any  reliable  multicast  protocol  that  needs  to  organize  re¬ 
ceivers  according  to  packet-loss  correlation.  The  rest  of  this 
section  describes  how  Tracer  can  be  used  as  a  component 
of  representative  examples  of  complete-recovery  protocols, 
deadline-oriented  protocols,  and  congestion-control  proto¬ 
cols. 

4.1  Grouping  by  Topology  with  Tracer 

Currently,  RMTP  statically  selects  designated  receivers  (DR) 
to  supply  retransmissions  to  the  other  receivers.  A  receiver’s 
choice  of  DR  (i.e.,  its  parent)  is  chosen  before  the  session  be¬ 
gins,  but  is  updated  automatically  during  the  session  should 
the  DR  fail. 

Adding  Tracer  as  a  component  to  RMTP  so  as  to  handle 
auto-selection  of  DRs  for  each  receiver  requires  no  modifica¬ 
tions  to  Tracer,  and  Tracer  has  no  effect  on  the  error  control 
mechanisms  designed  for  RMTP.  The  caveat  is  that,  because 
receivers  can  change  DRs  during  the  session,  aggregate  Acks 
must  be  used  in  order  to  ensure  complete  error  recovery, 
which  are  Acks  that  start  from  the  bottom  of  the  receiver 
tree  and  are  aggregated  towards  the  top2.  Aggregate  Acks 
are  required  for  tree-based  complete  recovery  protocols  that 
make  changes  to  the  topology  of  the  receiver-tree  during  the 
session  [11]. 

Tracer  could  also  be  used  as  a  component  of  the  SRM 
protocol.  Rather  than  performing  Nack-avoidance  between 
all  receivers,  Tracer  could  be  used  to  form  separate  groups 
of  receivers,  organized  in  a  tree  structure,  where  Nacks  and 
Nack-avoidance  timers  remain  local  to  each  group.  Nacks 
would  be  forwarded  up  the  tree  to  the  source;  retransmitters 
that  have  already  received  the  missing  data  correctly  could 
subcast  a  repair  down  the  routing  tree. 

4.2  Grouping  by  Deadlines  with  Tracer 

Because  Tracer  is  designed  to  multicast  retransmissions,  it 
is  useful  as  a  resilient  multicast  protocol.  In  other  words,  if 
an  application  does  not  require  100%  of  the  data,  such  as 
in  audio  or  video  services  where  some  loss  is  tolerable,  the 
emphasis  of  a  reliable  protocol  may  be  placed  on  getting  as 
much  of  the  data  as  possible  before  a  deadline  passes.  Mul¬ 
ticast  retransmissions  can  lower  average  packet  delay  [18], 
which  is  desirable  for  the  real-time  streaming  applications 
that  resilient  multicast  protocols  are  designed  to  support. 

In  this  situation,  retransmitters  should  be  chosen  based 
on  what  percentage  of  the  data  is  received  correctly  for  a 
given  time  scale.  This  approach  was  first  proposed  in  the 
STORM  protocol  and  was  called  resilient  multicast.  For 
example,  for  a  requester  that  must  receive  all  data  within 
130ms,  a  retransmitter  that  receives  50%  of  the  data  within 
100ms,  and  65%  within  300ms,  is  actually  a  better  choice 
than  another  retransmitter  that  receives  10%  of  the  data 
within  100nrs  and  100%  within  300ms  (assuming  they  are 
both  within  10ms  of  the  requester). 

The  STORM  protocol  [22]  relies  on  unicast  retransmis¬ 
sions,  but  Tracer’s  approach  to  building  the  tree  of  retrans¬ 
mitters  can  be  adapted  to  the  resilient  multicast  model, 
with  the  advantage  of  multicast  retransmissions.  Just  as 
in  STORM,  a  STORM-Tracer  combination  evaluates  each 
retransmitter  based  on  what  percentage  of  the  packets  can 

2RMTP  does  not  specify  aggregate  Acks  explicitly,  but  the  Reli¬ 
able  Multicast  File  Transfer  Protocol  (RMFTP),  an  application  based 
on  RMTP,  does  already  use  aggregate  Acks. 


Figure  10:  Determining  the  multicast  propagation  delay  be¬ 
tween  hosts 


be  received  at  the  requester  before  the  requester’s  buffer 
runs  out.  While  the  original  STORM  protocol  measures 
the  unicast  delay,  STORM-Tracer  retransmitters  determine 
the  percentage  of  packets  it  has  received  correctly  from  the 
source  within  time  t  +  B—  (M+d),  where  t  is  the  average  de¬ 
lay  of  packet  sent  from  the  source  to  the  requester,  B  is  the 
buffer  size  of  the  requester  (in  time  units),  M  is  the  multi¬ 
cast  propagation  delay  between  retransmitter  and  requester 
(measured  with  PATH_ADV  messages)  and  d  is  the  unicast 
propagation  delay  between  requester  and  retransmitter.  The 
roundtrip  time  on  the  multicast  tree  is  not  needed  because 
Acks  and  Nacks  are  sent  unicast  to  retransmitters.  If  sub¬ 
cast  is  to  be  used,  then  PATELADVs  should  be  subcast. 
The  tradeoff  with  this  approach,  as  opposed  to  the  original 
version  of  STORM,  is  that  determining  the  suitability  of  a 
potential  retransmitter  requires  multicast  traffic. 

The  measurement  of  M  +  d  requires  three  steps,  and  Fig¬ 
ure  10  illustrates  the  process.  Each  multicast  PATH_ADV 
includes  a  timestamp,  a  (step  1  in  Figure  10).  Requesters 
then  echo  a  in  their  IM_YOUR_CHILD  message  sent  uni¬ 
cast  to  the  advertiser,  which  the  advertiser  receives  at  time 
b  (step  2).  The  advertiser  is  then  able  to  determine  M  +d  = 
b  —  a.  This  value  is  returned  to  the  requester  along  with  the 
percentage  of  packets  received  correctly  by  time  b  —  a  (step 
3).  (Time  spent  processing  at  the  respective  hosts  should 
be  subtracted  from  the  cumulative  time.) 

4.3  Grouping  by  Congestion  with  Tracer 

Work  by  De  Lucia  and  Obraczka  [3]  groups  receivers  to¬ 
gether  with  the  same  state  of  congestion.  Each  group  has 
a  representative  that  is  ideally  the  most  upstream  receiver 
among  those  in  the  group,  because  such  a  receiver  is  most 
likely  located  ahead  of  congested  links.  The  main  problem 
in  congestion  control  is  for  a  multicast  flow  to  share  the 
bandwidth  of  a  congested  link  with  TCP  flows  in  a  fair  way. 
TCP  reacts  to  congestion  within  a  roundtrip  time  and  mul¬ 
ticast  protocols  need  to  do  the  same  thing.  However,  there 
may  be  multiple  bottlenecks  in  a  multicast  flow. 

In  De  Lucia  and  Obraczka’s  scheme,  the  source  builds 
the  representative  set  by  choosing  the  receiver  that  has  not 
Nacked  for  the  longest  period  of  time,  because  it  would  seem 
it  has  the  least  congestion.  This  approach  avoids  tinier  traf¬ 
fic  and  is  simple  to  implement,  but  cannot  guarantee  that 
the  choice  of  representative  is  optimal,  because  Nacks  and 
Acks  may  get  lost  during  times  of  congestion.  Handley  has 
proposed  an  extension  to  this  approach  [6]  based  on  the  no¬ 
tion  of  relays  and  subgroups.  Relays  are  receivers  that  buffer 
data  from  the  source  (or  another  relay)  and  re-multicast  the 
data  at  a  slower  rate  to  a  subgroup  of  receivers.  Because 
each  member  of  the  subgroup  leaves  the  main  multicast 
group,  the  relay  acts  as  a  representative  of  the  subgroup  on 


the  main  multicast  group,  sending  Acks  and  Nacks  towards 
the  source.  Group  formation  is  based  on  Nack-avoidance 
algorithm  specified  by  the  SRM  [5]  protocol.  If  a  receiver 
notices  that  other  receivers’  Nacks  are  for  the  same  or  a 
superset  of  packets,  the  receiver  reduces  its  Nack  backoff 
tinier.  If  the  other  receiver’s  Nacks  are  not  duplicates,  or 
are  for  a  subset  of  lost  packets,  then  the  examining  receiver 
increases  its  backoff  timer.  The  hope  is  that  receivers  with 
lower  packet  loss  than  sites  with  which  they  have  correlated 
packet-loss  will  Nack  sooner  and  become  representatives. 
To  find  a  relay,  representatives  start  an  ERS  to  other  re¬ 
ceivers,  looking  for  a  member  with  significantly  lower  loss 
rates.  Once  willing  receivers  are  found,  the  representative 
starts  a  new  multicast  subgroup;  the  relay  site  is  now  pri¬ 
mary  source  of  data  for  all  subgroup  members,  as  they  leave 
the  source’s  original  group. 

Tracer’s  selection  of  retransmitters  for  each  receiver  is 
different  than  the  non-deterministic  approaches  employed 
by  De  Lucia  and  Obraczka’s  technique  and  Handley’s  tech¬ 
nique.  Tracer,  as  we  have  defined  it  in  Section  3,  builds 
relationships  among  receivers  deterministically,  such  that  a 
receiver  can  always  find  another  receiver  upstream  from  con¬ 
gestion. 

5  Experimental  Results 

To  test  how  well  topological  tracing  predicts  packet-loss  cor¬ 
relation,  we  recorded  data  on  the  Collaborative  Advanced 
Internet  Research  Network  (CAIRN),  a  research  network  of 
about  20  routers  spanning  the  continental  United  States, 
consisting  mainly  of  workstations  running  SunOS  4.1  and 
PCs  running  FreeBSD.  We  chose  to  use  CAIRN  rather  than 
a  few  sundry  hosts  across  the  Internet  because  of  the  open 
access  to  a  large  number  of  hosts,  and  because  we  were  able 
to  collect  data  at  almost  all  points  in  the  multicast  rout¬ 
ing  tree,  rather  than  just  the  end  points.  Figure  11  shows 
the  topology  of  the  13  routers  in  CAIRN  we  used  to  record 
multicast  data. 

To  see  how  packet  loss  was  correlated  over  the  network, 
we  sent  a  steady  stream  of  data  from  the  ucsc  router  to 
the  12  other  routers  in  Figure  11.  We  performed  six  sepa¬ 
rate  sessions  for  manageability,  and  we  were  able  to  collect 
data  at  the  shaded  nodes  of  the  routing  tree  regarding  which 
packets  were  received;  in  some  sessions,  data  was  not  able  to 
be  collected  at  a  couple  hosts.  Figure  12  shows  the  packet 
loss  per-session  for  each  host.  In  all  54,500  packets  of  about 
512K  each,  or  about  27  megabytes,  of  data  were  multicast 
from  the  source. 

Table  13  correlates  each  host’s  packet-loss  with  every 
host  that  is  a  direct  ancestor,  an  indirect  ancestor,  an  in¬ 
direct  descendent  according  to  its  path  in  the  routing  tree, 
aggregating  all  sessions.  Each  host  was  compared  with  an¬ 
other  for  only  those  sessions  they  had  in  common.  The  figure 
lists  for  each  compared  host,  the  packet-loss  correlation,  the 
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Figure  12:  Packet  loss  data  for  all  hosts  for  each  session. 


ranking  of  the  host  as  Tracer  would  have  chosen  retransmit¬ 
ters,  the  distance  between  the  two  hosts,  and  the  number  of 
routers  they  share  in  common  on  the  path  from  the  source. 

For  example,  lblpc  is  compared  against  four  hosts.  Both 
sri  and  pare  are  direct  ancestors  of  lblpc,  i.e. ,  they  lie  di¬ 
rectly  on  the  path  to  the  source  from  lblpc.  As  we  would 
expect,  100%  of  the  packets  lost  at  sri  and  pare  were  also 
lost  at  lblpc.  Ames  is  two  hops  from  lblpc  and  has  two 
hops  in  common  on  the  path  from  the  source,  presented  as 
“(2:2)”;  93%  of  the  packets  lost  at  ames  were  also  lost  at 
lblpc.  Ames  is  in  boldface  because  it  is  the  host  that  lblpc 
would  choose  as  a  retransmitter  if  no  directly  acceptable 
host  was  available.  (Note  because  they  are  actually  equidis¬ 
tant  from  the  source  and  each  other,  the  tie  between  ames 
and  lblpc  is  broken  alphabetically;  ames  would  not  choose 
lblpc  as  a  parent.)  Sun  is  an  indirect  descendant  of  lblpc 
because  they  share  a  common  path  from  the  source,  but 
lblpc  is  closer  to  the  source;  69%  of  the  packets  lost  at  sun 
were  lost  at  lblpc. 

As  expected,  the  acceptability  algorithm  chooses  the  host 
with  the  highest  packet-loss  correlation  (when  ignoring  di¬ 
rectly  acceptable  hosts)  in  all  cases,  and  the  ordering  of  the 
other  hosts  based  on  acceptability  exactly  matches  the  or¬ 
dering  based  on  packet-loss  correlation.  Therefore,  we  con¬ 
clude  that  tracing  is  a  good  predictor  of  packet-loss  correla¬ 
tion. 

The  goal  of  choosing  retransmitters  based  on  acceptabil¬ 
ity  is  not  to  find  a  retransmitter  with  the  lowest  packet  loss. 
Rather,  the  goal  is  to  find  the  retransmitter  with  the  highest 
packet-loss  correlation,  and  that  is  the  closest  to  the  source. 
When  this  is  the  case,  the  parent  will  probably  lose  a  packet 
that  the  child  has  lost. 

6  Conclusion 

Tracer  addresses  both  error  control  and  congestion  control 
aspects  of  a  multicast  protocol,  and  is  applicable  to  reli¬ 
able  multicast  protocols  providing  complete  error  recovery, 
as  well  as  reliable  multicast  protocols  providing  deadline- 
oriented  recovery.  While  previous  reliable  multicast  proto¬ 
cols  attempt  to  discover  the  topology  of  the  underlying  mul¬ 
ticast  tree  by  measuring  hop  counts  or  propagation  delay, 
Tracer  provides  a  mechanism  that  uses  currently  available 
router  functions  to  record  the  exact  multicast  route  from 
the  source  to  each  receiver.  Because  the  path  is  known, 
Tracer  is  able  to  select  a  retransmitter  for  each  receiver 
such  that  packet-loss  correlation,  is  maintained  between  the 
hosts.  Tracer  is  the  first  technique  that  organizes  the  re¬ 
ceiver  set  and  maintains  packet-loss  correlation  withhout 
changes  to  the  routers.  Therefore,  Nacks  for  missing  data 
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Figure  13:  Observed  packet-loss  correlation  on  CAIRN.  Ac¬ 
ceptability  ties  are  broken  alphabetically. 


are  sent  to  retransmitters  who  are  closer  to  the  source,  and 
therefore  have  a  better  chance  of  having  the  data  and  will 
receive  the  retransmission  faster.  Furthermore,  Tracer  is 
the  first  scheme  to  identify  the  last  common  node  between 
retransmitters  and  requester  so  that  subcast  transmissions 
can  become  a  practical  solution,  with  only  small  local  addi¬ 
tions  to  routers.  Accurately  organizing  receivers  according 
to  packet-loss  correlation  ensures  that  retransmissions  can 
be  multicast  because  only  nodes  expecting  the  retransmis¬ 
sion  lie  along  the  multicast  subtree.  In  addition,  Tracer 
provides  elegant  mechanisms  to  isolate  congested  subtrees 
and  elect  representatives,  or  relays  for  efficient  congestion 
control  in  reliable  multicast  protocols. 

We  have  identified  that  extending  IGMP  with  MTRACE 
packets  multicast  from  sources  to  receivers  is  a  desirable 
feature  that  makes  Tracer  much  more  scalable.  We  believe 
that  such  a  feature  can  be  used  by  other  multicast  protocols. 

We  have  implemented  Tracer  as  a  stand  alone  application 
for  FreeBSD  computers.  The  executable  is  available  from 
http:/ /www. cse.ucsc.edu/research/ccrg/software/tracer.htnil 
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